下載亞洲大學首頁內容


In [ ]:
!wget http://www.asia.edu.tw/

讀取網頁


In [ ]:
homepage = sc.textFile('./index.html',use_unicode=False)

RDD Basic Function List

  • count() : 取得 RDD 裡頭元素個數
  • first() : 取得 RDD 裡頭第一個元素
  • take(n) : 取得 RDD 裡頭前 n 個元素
  • collect(n) : 取得 RDD 裡頭所有的元素

計算出這個檔案總共有多少行


In [ ]:
print "總共有:",homepage.count(),"行"

印出第一行


In [ ]:
print homepage.first()

印出前十行


In [ ]:
for line in homepage.take(10):
    print line

Python 中判斷是否有出現這某各字串


In [ ]:
if "亞洲" in "亞洲大學由蔡長海教授與林增連先生共同創辦":
    print "有在裡面"

請取代???完成以下的程式碼,並計算"亞洲" 所出現的行數


In [ ]:
count = 0
for line in homepage.???:
    if "???" in line :
        count = count + 1
print count

In [ ]:
if count == 29 : print "Good Job 你答對了"
else : print "請再試一次"

In [20]:
!sudo apt-get install libxml2-dev libxslt1-dev python-dev


Reading package lists... Done
Building dependency tree       
Reading state information... Done
python-dev is already the newest version.
The following NEW packages will be installed:
  libxml2-dev libxslt1-dev libxslt1.1
0 upgraded, 3 newly installed, 0 to remove and 138 not upgraded.
Need to get 1174 kB of archives.
After this operation, 5311 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu/ trusty/main libxslt1.1 i386 1.1.28-2build1 [140 kB]
Get:2 http://archive.ubuntu.com/ubuntu/ trusty-updates/main libxml2-dev i386 2.9.1+dfsg1-3ubuntu4.4 [628 kB]
Get:3 http://archive.ubuntu.com/ubuntu/ trusty/main libxslt1-dev i386 1.1.28-2build1 [405 kB]
Fetched 1174 kB in 10s (115 kB/s)
Selecting previously unselected package libxslt1.1:i386.
(Reading database ... 64402 files and directories currently installed.)
Preparing to unpack .../libxslt1.1_1.1.28-2build1_i386.deb ...
Unpacking libxslt1.1:i386 (1.1.28-2build1) ...
Selecting previously unselected package libxml2-dev:i386.
Preparing to unpack .../libxml2-dev_2.9.1+dfsg1-3ubuntu4.4_i386.deb ...
Unpacking libxml2-dev:i386 (2.9.1+dfsg1-3ubuntu4.4) ...
Selecting previously unselected package libxslt1-dev:i386.
Preparing to unpack .../libxslt1-dev_1.1.28-2build1_i386.deb ...
Unpacking libxslt1-dev:i386 (1.1.28-2build1) ...
Processing triggers for man-db (2.6.7.1-1ubuntu1) ...
Setting up libxslt1.1:i386 (1.1.28-2build1) ...
Setting up libxml2-dev:i386 (2.9.1+dfsg1-3ubuntu4.4) ...
Setting up libxslt1-dev:i386 (1.1.28-2build1) ...
Processing triggers for libc-bin (2.19-0ubuntu6.6) ...

In [21]:



Reading package lists... Done
Building dependency tree       
Reading state information... Done
Suggested packages:
  python-lxml-dbg
The following NEW packages will be installed:
  python-lxml
0 upgraded, 1 newly installed, 0 to remove and 138 not upgraded.
Need to get 583 kB of archives.
After this operation, 2402 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu/ trusty-updates/main python-lxml i386 3.3.3-1ubuntu0.1 [583 kB]
Fetched 583 kB in 11s (52.5 kB/s)
Selecting previously unselected package python-lxml.
(Reading database ... 64625 files and directories currently installed.)
Preparing to unpack .../python-lxml_3.3.3-1ubuntu0.1_i386.deb ...
Unpacking python-lxml (3.3.3-1ubuntu0.1) ...
Setting up python-lxml (3.3.3-1ubuntu0.1) ...

In [ ]: